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DETAILED ACTION 

1 . This office action is in response to correspondence filed August 1 , 2007 in 
reference to application 10/731 ,929. Claims 1-9, and 12-14 are pending in the 
application and have been examined. 

Response to Amendment 

2. The amendments to the claims filed August 1 , 2007, have been examined and 
considered in this office action. Claims 10 and 1 1 have been cancelled, and claim 12- 
14 have been added. 

Response to Arguments 

3. Applicant's arguments filed August 1 , 2007 have been fully considered but they 
are not persuasive. 

4. With regards to claims 1 , 9 and 12, in response to applicant's argument that the 
references fail to show certain features of applicant's invention, it is noted that the 
features upon which applicant relies (i.e., the details of what exactly is the "Absolute 
Loudness") are not recited in the rejected claim(s). Although the claims are interpreted 
in light of the specification, limitations from the specification are not read into the claims. 
See In re Van Geuns, 988 F.2d 1 181, 26 USPQ2d 1057 (Fed. Cir. 1993). For the 
purposes of this office action, absolute loudness will be interpreted as the amplitude of 
the received speech. 
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5. With regards to claim 3, that Gable et al is not combinable with Lee et al, the 
examiner respectfully disagrees. Gable clearly uses parameters including amplitude of 
the signal to verify speaker identity paragraph 0027. This is similar to the system of lee, 
and therefore it is combinable. In response to applicant's argument that the examiner's 
conclusion of obviousness is based upon improper hindsight reasoning, it must be 
recognized that any judgment on obviousness is in a sense necessarily a reconstruction 
based upon hindsight reasoning. But so long as it takes into account only knowledge 
which was within the level of ordinary skill at the time the claimed invention was made, 
and does not include knowledge gleaned only from the applicant's disclosure, such a 
reconstruction is proper. See In re McLaughlin, 443 F.2d 1392, 170 USPQ 209 (CCPA 
1971). 

6. With regards to claim 6, that Brandstein does not teach how to determine the 
absolute loudness using auditory and binaural processing, the examiner respectfully 
disagrees. The relationship on page 21 relates Amplitude as a function of distance from 
the source, which is usable to determine the amplitude at the source. Given the 
detected amplitude and the source location, which can be detected using the source 
localization method described throughout the paper, one of ordinary skill in the art could 
clearly determine the amplitude of the sound at the source. 
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7. With regards to claim 8, that Brandstein does not teach using the arrival times of 
the speech signal between two or more microphones, the examiner respectfully 
disagrees. The quote used by the applicant from Brandstein has been misunderstood. 
The applicant uses "There is no attempt made to define time-difference of arrival 
(TDOA) values relative to a single reference sensor or an absolute scale;" page 3, line 
1 . This is merely referring to one way TDOA can be obtained. Throughout the paper, 
for example T() in equation 5 on page 6, TDOA estimates are used in the method of 
determining the location of the source. Therefore it is clear that Brandstein does in fact 
use TDOA in determining location. 



Claim Rejections - 35 USC §112 

8. The following is a quotation of the second paragraph of 35 U.S.C. 112: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his invention. 

9. Claim 14 is rejected under 35 U.S.C. 112, second paragraph, as being indefinite 
for failing to particularly point out and distinctly claim the subject matter which applicant 
regards as the invention. Claim 14 recites the limitation, "said microphone" in line 6 of 
the claim. However, it is unclear is to which microphone this applies, as the previously 
the claim mentions only "two or more microphones." 



Claim Rejections - 35 USC § 102 
1 0. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 
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A person shall be entitled to a patent unless - 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
States. 

11. Claims 1, 2, 9, 10, 11, and 12 are rejected under 35 U.S.C. 102(b) as being 
anticipated by Lee et al. (Recognition of Negative Emotions from the Speech Signal). 

12. Consider claim 1 , Lee teaches a method for processing speech (This paper 
reports on methods for automatic classification of spoken utterances based on the 
emotional state of the speaker; page 240, column 2, lines 3-4.), comprising the steps of: 

receiving a speech input of a speaker (The speech data used in the experiments 
was obtained from real users engaged in a spoken dialog with a machine agent over the 
telephone; page 241, column 1, lines 5-7.), 

generating speech parameters from said speech input (In our experiments, we 
computed only acoustic features such as pitch and energy related features from the 
speech signal; page 241, column 2, lines 46-47.), 

determining parameters describing an absolute loudness of said speech input 
(The acoustic features chosen for emotion recognition comprised utterance-level 
statistics obtained from the pitch and energy information of the signal. These included 
mean median, standard deviation, maximum and minimum for energy; page 241, 
column 1, lines 57-61. Energy is the amplitude, and therefore the loudness of the 
signal.), 

evaluating said speech input and/or said speech parameters using said 
parameters describing the absolute loudness (This paper reports on methods for 
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automatic classification of spoken utterances based on the emotional state of the 
speaker; using utterance level features; page 240, column 2, lines 3-13.). 

1 3. Consider claim 2, Lee teaches a method according to claim 1 , wherein the step 
of evaluation comprises a step of emotion recognition (This paper reports on methods 
for automatic classification of spoken utterances based on the emotional state of the 
speaker; page 240, column 2, lines 3-4.). 

14. Consider claim 9, Lee teaches a speech processing system (This paper reports 
on methods for automatic classification of spoken utterances based on the emotional 
state of the speaker; page 240, column 2, lines 3-4.), configured to: 

receive a speech input of a speaker (The speech data used in the experiments 
was obtained from real users engaged in a spoken dialog with a machine agent over the 
telephone; page 241, column 1, lines 5-7.), 

generate speech parameters from said speech input (In our experiments, we 
computed only acoustic features such as pitch and energy related features from the 
speech signal; page 241, column 2, lines 46-47.), 

determine parameters describing an absolute loudness of said speech input (The 
acoustic features chosen for emotion recognition comprised utterance-level statistics 
obtained from the pitch and energy information of the signal. These included mean 
median, standard deviation, maximum and minimum for energy; page 241, column 1, 
lines 57-61. Energy is based the amplitude, and therefore the loudness of the signal.), 
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evaluate said speech input and/or said speech parameters using said 
parameters describing the absolute loudness (This paper reports on methods for 
automatic classification of spoken utterances based on the emotional state of the 
speaker; using utterance level features; page 240, column 2, lines 3-13.). 

1 5. Consider claim 12, Lee teaches a computer readable medium encoded with a 
computer program configure to cause a processor based device to execute the method 
of: (This paper reports on methods for automatic classification of spoken utterances 
based on the emotional state of the speaker; page 240, column 2, lines 3-4. a computer 
readable medium is inherent as this is computer based.): 

receiving a speech input of a speaker (The speech data used in the experiments 
was obtained from real users engaged in a spoken dialog with a machine agent over the 
telephone; page 241, column 1, lines 5-7.), 

generating speech parameters from said speech input (In our experiments, we 
computed only acoustic features such as pitch and energy related features from the 
speech signal; page 241, column 2, lines 46-47.), 

determining parameters describing an absolute loudness of said speech input 
(The acoustic features chosen for emotion recognition comprised utterance-level 
statistics obtained from the pitch and energy information of the signal. These included 
mean median, standard deviation, maximum and minimum for energy; page 241, 
column 1, lines 57-61. Energy is the amplitude, and therefore the loudness of the 
signal.), 
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evaluating said speech input and/or said speech parameters using said 
parameters describing the absolute loudness (This paper reports on methods for 
automatic classification of spoken utterances based on the emotional state of the 
speaker; using utterance level features; page 240, column 2, lines 3-13.). 



Claim Rejections - 35 USC § 103 

16. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 

obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

17. The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1 , 148 
USPQ 459 (1966), that are applied for establishing a background for determining 
obviousness under 35 U.S.C. 103(a) are summarized as follows: 

1 . Determining the scope and contents of the prior art. 

2. Ascertaining the differences between the prior art and the claims at issue. 

3. Resolving the level of ordinary skill in the pertinent art. 

4. Considering objective evidence present in the application indicating 
obviousness or nonobviousness. 

18. Claim 3 is rejected under 35 U.S.C. 103(a) as being unpatentable over Lee in 
view of Gable et al. (US PAP 2005/0060153). 



19. Consider claim 3, Lee teaches the method according to claim 1 but does not 
specifically teach wherein the step of evaluation comprises a step of speaker 
identification. 
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In the same field of speech processing, Gable teaches a step of speaker 
identification using similar acoustic features as described by Lee (Verification 
parameters represent the individuality of the speaker, containing information about the 
timing, pitch, amplitude or spectral content of the speech; paragraph 0027. Abstract 
discusses using these features for speaker verification.). 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to provide speaker identification as taught by Gable, with the speech 
processing of Lee in order to provide a method of further classifying a speech signal 
beyond emotional classification. 

20. Claims 4-8, 13 and 14 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Lee in view of Brandstein et al. (Microphone Array Localization Error 
Estimation with Application to sensor Placement). 

21 . Consider claim 4, Lee teaches a method according to claim 1 but does not 
specifically teach wherein a microphone array comprising a plurality of microphones is 
used for determining said parameters describing the absolute loudness. 

In the same field of speech processing, Brandstein teaches using a microphone 
array comprising a plurality of microphones (see figure 6) for determining said 
parameters describing the absolute loudness (Existing array systems have been used in 
a number of applications. These include teleconferencing, speech recognition, speaker 
identification, speech acquisition in an automobile environment, sound capture in 
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reverberant enclosures, large room recordings, conferencing, acoustic surveillance, and 
hearing aid devices; page 1 lines 11-15. Obviously, the array of microphones would be 
used to determine the parameters including loudness needed for these applications.). 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to use a microphone array as taught by Brandstein with the speech 
processing system of Lee in order to provide a means for provide a high quality signal of 
the desired speaker (Introduction, Brandstein.). 

22. Consider claim 5, Lee teaches a method according to claim 1 but does not 
specifically teach wherein a location and/or distance of the speaker is determined. 

But in the same field of speech processing Brandstein teaches determining a 
location and/or distance of the speaker (Section 2 discusses using a microphone array 
with a time difference of arrival algorithm to determine a location of a speaker; pages 3- 
5.). 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to use a microphone array for source location as taught by Brandstein 
with the speech processing system of Lee in order to provide a means for provide a 
high quality signal of the desired speaker (Introduction, Brandstein.). 

23. Consider claim 6, Lee teaches a method according to claim 1 but does not 
specifically teach that the absolute loudness is determined using algorithms for auditory 
and/or binaural processing. 
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In the same field of speech signal processing, Brandstein teaches that the 
absolute loudness is determined using algorithms for auditory and/or binaural 
processing (Page 21 teaches modeling a source as a cardioid radiator, wherein the 
source amplitude is a function of distance from the source. When this information is 
combined with the source locating algorithms of section 2, one can obviously estimate 
the amplitude at the source itself given the amplitude at the microphone array.). 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to combine the method of level determination as suggested by 
Brandstein with the speech processing of lee in order to provide a more accurate 
representation of the actual level at the source providing a better means for more 
accurately categorizing a speech signal. 

24. Consider claim 7, Brandstein teaches a method according to claim 5, wherein 
said absolute loudness is computed by normalizing a measured loudness, or energy by 
said distance (Page 21 provides a relationship of a source amplitude as a function of 
distance and angle form the source. This relationship could obviously be used to 
normalize an amplitude value to estimate the amplitude at the source.) 

25. Consider claim 8, Brandstein teaches a method according to claim 5, wherein 
said distance is determined using the time delay of the speech input between said 
plurality of microphones (Sections 2 and 3 discuss using a microphone array with a time 
difference of arrival algorithm to determine a location of a speaker; pages 3-10.) 
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26. Consider claim 13, Lee teaches a method for processing speech, comprising: 

receiving a speech signal of a speaker (The speech data used in the experiments 
was obtained from real users engaged in a spoken dialog with a machine agent over the 
telephone; page 241, column 1, lines 5-7.); 

generating speech parameters from said speech signal (In our experiments, we 
computed only acoustic features such as pitch and energy related features from the 
speech signal; page 241, column 2, lines 46-47.); and 

evaluating at least one of said speech signal and said speech parameters using 
the normalized loudness or energy (This paper reports on methods for automatic 
classification of spoken utterances based on the emotional state of the speaker; using 
utterance level features; page 240, column 2, lines 3-13.). 

However Lee does not specifically teach: 

determining a distance of the speaker based on a time delay of a respective 
arrival of said speech signal at two or more microphones; and 

normalizing a measured loudness or energy by said distance. 

In the same field of speech processing, Brandstein teaches determining a 
distance of the speaker based on a time delay of a respective arrival of said speech 
signal at two or more microphones (Sections 2 and 3 discuss using a microphone array 
with a time difference of arrival algorithm to determine a location of a speaker; pages 3- 
10.); and 
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normalizing a measured loudness or energy by said distance (Page 21 provides 
a relationship of a source amplitude as a function of distance and angle form the 
source. Although this relationship was given to model the source, one of ordinary skill 
in the art at the time of the invention would have thought, given the location of the 
source (as determined in the localization method discussed throughout Brandstein) and 
the detected amplitude at the microphone array, to use the relationship to determine the 
source amplitude). 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to use a microphone array for source location as taught by Brandstein 
with the speech processing system of Lee in order to provide a means for provide a 
high quality signal of the desired speaker that is not adversely effected by the distance 
from a speaker to the microphone array. (Introduction, Brandstein.). 

27. Consider claim 14, Lee teaches a system for emotion recognition and/or speaker 
identification, comprising: 

a data processor configured to generate speech parameters from said speech 
signal (In our experiments, we computed only acoustic features such as pitch and 
energy related features from the speech signal; page 241, column 2, lines 46-47.), and 

further configured to evaluate at least one of said speech signal and said speech 
parameters using the normalized loudness or energy (This paper reports on methods 
for automatic classification of spoken utterances based on the emotional state of the 
speaker; using utterance level features; page 240, column 2, lines 3-13.). 
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However Lee does not specifically teach: 

at least two microphones configured to receive a speech signal; and 
a processor configured to determine a distance of the speaker based on a time 
delay of a respective arrival of said speech signal at said microphone, to normalize a 
measured loudness or energy by said distance. 

In the same field of speech processing Brandstein teaches: 
at least two microphones configured to receive a speech signal (see microphone 
array in figure 6); and 

a processor configured to determine a distance of the speaker based on a time 
delay of a respective arrival of said speech signal at said microphone (Sections 2 and 3 
discuss using a microphone array with a time difference of arrival algorithm to determine 
a location of a speaker; pages 3-10.), to normalize a measured loudness or energy by 
said distance (Page 21 provides a relationship of a source amplitude as a function of 
distance and angle form the source. Although this relationship was given to model the 
source, one of ordinary skill in the art at the time of the invention would have thought, 
given the location of the source (as determined in the localization method discussed 
throughout Brandstein) and the detected amplitude at the microphone array, to use the 
relationship to determine the source amplitude). 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to use a microphone array for source location as taught by Brandstein 
with the speech processing system of Lee in order to provide a means for provide a 
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high quality signal of the desired speaker that is not adversely effected by the distance 
from a speaker to the microphone array. (Introduction, Brandstein.). 

Conclusion 

28. Applicant's amendment necessitated the new ground(s) of rejection presented in 
this Office action. Accordingly, THIS ACTION IS MADE FINAL See MPEP 
§ 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 
CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Douglas C. Godbold whose telephone number is (571) 
270-1451 . The examiner can normally be reached on Monday-Thursday 7:00am- 
4:30pm Friday 7:00am-3:30pm. 
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If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Patrick Edouard can be reached on (571) 272-7603. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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